11 research outputs found

    CCL: a portable and tunable collective communication library for scalable parallel computers

    Get PDF
    A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the advantage of portability. A library of this nature, the Collective Communication Library (CCL), intended for the line of scalable parallel computer products by IBM, has been designed. CCL is part of the parallel application programming interface of the recently announced IBM 9076 Scalable POWERparallel System 1 (SP1). In this paper, we examine several issues related to the functionality, correctness, and performance of a portable collective communication library while focusing on three novel aspects in the design and implementation of CCL: 1) the introduction of process groups, 2) the definition of semantics that ensures correctness, and 3) the design of new and tunable algorithms based on a realistic point-to-point communication model

    Designing Broadcasting Algorithms in the Postal Model for Message-Passing Systems

    No full text
    In many distributed-memory parallel computers and high-speed communication networks, the exact structure of the underlying communication network may be ignored. These systems assume that the network creates a complete communication graph between the processors, in which passing messages is associated with communication latencies. In this paper, we explore the impact of communication latencies on the design of broadcasting algorithms for fully-connected message-passing systems. For this purpose, we introduce the postal model that incorporates a communication latency parameter 1. This parameter measures the inverse of the ratio between the time it takes an originator of a message to send the message and the time that passes until the recipient of the message receives it. We present an optimal algorithm for broadcasting one message in systems with n processors and communication latency , the running time of which is \Theta( log n log(+1) ). For broadcasting m 1 messages, we first e..

    Organization of systems with bussed interconnections

    No full text
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1990.Includes bibliographical references (p. 113-120).by Shlomo Kipnis.Ph.D
    corecore